UCB Algorithm for Exponential Distributions

نویسندگان

  • Wassim Jouini
  • Christophe Moy
چکیده

We introduce in this paper a new algorithm for Multi-Armed Bandit (MAB) problems. A machine learning paradigm popular within Cognitive Network related topics (e.g., Spectrum Sensing and Allocation). We focus on the case where the rewards are exponentially distributed, which is common when dealing with Rayleigh fading channels. This strategy, named Multiplicative Upper Confidence Bound (MUCB), associates a utility index to every available arm, and then selects the arm with the highest index. For every arm, the associated index is equal to the product of a multiplicative factor by the sample mean of the rewards collected by this arm. We show that the MUCB policy has a low complexity and is order optimal. Index Terms Learning, Multi-armed bandit, Upper Confidence Bound Algorithm, UCB, MUCB, exponential distribution.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Bivariate Generalized Exponential-Power Series Class of Distributions

In this paper, we introduce a new class of bivariate distributions by compounding the bivariate generalized exponential and power-series distributions. This new class contains the bivariate generalized exponential-Poisson, bivariate generalized exponential-logarithmic, bivariate generalized exponential-binomial and bivariate generalized exponential-negative binomial distributions as specia...

متن کامل

The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond

This paper presents a finite-time analysis of the KL-UCB algorithm, an online, horizonfree index policy for stochastic bandit problems. We prove two distinct results: first, for arbitrary bounded rewards, the KL-UCB algorithm satisfies a uniformly better regret bound than UCB and its variants; second, in the special case of Bernoulli rewards, it reaches the lower bound of Lai and Robbins. Furth...

متن کامل

A minimax and asymptotically optimal algorithm for stochastic bandits

We propose the kl-UCB algorithm for regret minimization in stochastic bandit models with exponential families of distributions. We prove that it is simultaneously asymptotically optimal (in the sense of Lai and Robbins’ lower bound) and minimax optimal. This is the first algorithm proved to enjoy these two properties at the same time. This work thus merges two different lines of research with s...

متن کامل

Nearly Optimal Adaptive Procedure for Piecewise-Stationary Bandit: a Change-Point Detection Approach

Multi-armed bandit (MAB) is a class of online learning problems where a learning agent aims to maximize its expected cumulative reward while repeatedly selecting to pull arms with unknown reward distributions. In this paper, we consider a scenario in which the arms’ reward distributions may change in a piecewise-stationary fashion at unknown time steps. By connecting changedetection techniques ...

متن کامل

On Upper-Confidence Bound Policies for Switching Bandit Problems

Many problems, such as cognitive radio, parameter control of a scanning tunnelling microscope or internet advertisement, can be modelled as non-stationary bandit problems where the distributions of rewards changes abruptly at unknown time instants. In this paper, we analyze two algorithms designed for solving this issue: discounted UCB (D-UCB) and sliding-window UCB (SW-UCB). We establish an up...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1204.1624  شماره 

صفحات  -

تاریخ انتشار 2012